Hierarchical Topics in Texts Generated by a Stream

نویسندگان

چکیده

We observe a stream of text messages, generated by Twitter or file and present tool which constructs dynamic list topics. Each tweet generates edges graph where the nodes are tags link author with in tweet. consider large clusters approximate Reservoir sampling. study giant components each component represents topic. The high degree their provide first layer topic, iteration over hierarchical decomposition. For standard text, we use Weighted sampling weight is similarity between words given Word2vec. overlapping windows topicalization on window. compare this approach Word2content LDA techniques case viewed as stream.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Multiple Topics in Texts

In this paper, we present an innovative method for multi-label text classification. Our method uses Lucene to index texts and then assigns one or more classes to a new text based on its similarity relative to an annotated corpus. For finer granularity, we split the text into phrases, and then we focus on the noun phrases. Instead of classifying the entire text, we classify each noun phrase. The...

متن کامل

Searching for Topics in a Large Collection of Texts

We describe an original method that automatically finds specific topics in a large collection of texts. Each topic is first identified as a specific cluster of texts and then represented as a virtual concept, which is a weighted mixture of words. Our intention is to employ these virtual concepts in document indexing. In this paper we show some preliminary experimental results and discuss direct...

متن کامل

Extracting topics in texts: Towards a fuzzy logic approach

The paper presents a preliminary investigation of potential methods for extracting semantic views of text contents, which go beyond standard statistical indexation. The aim is to build kinds of fuzzily weighted structured images of semantic contents. A preliminary step consists in identifying the different types of relations (is-a, part-of, related-to, synonymy, domain, glossary relations) that...

متن کامل

Extracting Topics from Texts Based on Situations

To understand text, we must relate it with specified situations. This paper, on the basis of such an idea, discusses how the things that a text describes and the situation that the text relates to are expressed in a computer and how the topic of a text is extracted.

متن کامل

Uncover Trending Topics on Data Stream by Linear Prediction Modeling

The use of streaming data to analyze and discern patterns to make better decisions is becoming the basis for creating significant value for companies. Torrents of data flooding continuously force organizations to understand what information truly count, and analyze what they can do with that information. The aim of this paper is to explore the adaptation of a linear prediction model to discover...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal on natural language computing

سال: 2022

ISSN: ['2278-1307', '2319-4111']

DOI: https://doi.org/10.5121/ijnlc.2022.11503